Search results for "Survey sampling"

showing 10 items of 24 documents

Penalization and data reduction of auxiliary variables in survey sampling

2012

Survey sampling techniques are quite useful in a way to estimate population parameterssuch as the population total when the large dimensional auxiliary data setis available. This thesis deals with the estimation of population total in presenceof ill-conditioned large data set.In the first chapter, we give some basic definitions that will be used in thelater chapters. The Horvitz-Thompson estimator is defined as an estimator whichdoes not use auxiliary variables. Along with, calibration technique is defined toincorporate the auxiliary variables for sake of improvement in the estimation ofpopulation totals for a fixed sample size.The second chapter is a part of a review article about ridge re…

Estimateur assisté par un modèleModel-assisted estimatorRégression ridge[ MATH.MATH-GM ] Mathematics [math]/General Mathematics [math.GM]Calage sur composantes principalesPenalized calibration[MATH.MATH-GM] Mathematics [math]/General Mathematics [math.GM]Estimateur basé sur un modèleSurvey sampling[MATH.MATH-GM]Mathematics [math]/General Mathematics [math.GM]Ridge regressionCalage pénaliséModel-based estimatorColinéaritéEstimateur de Horvitz-ThompsonHorvitz-Thompson estimatorSondageMulticollinearityPrincipal component calibration
researchProduct

B-Spline Estimation in a Survey Sampling Framework

2021

Nonparametric regression models have been used more and more over the last years to model survey data and incorporate efficiently auxiliary information in order to improve the estimation of totals, means or other study parameters such as Gini index or poverty rate. B-spline nonparametric regression has the benefit of being very flexible in modeling nonlinear survey data while keeping many similarities and properties of the classical linear regression. This method proved to be efficient for deriving a unique system of weights which allowed to estimate in an efficient way and simultaneously many study parameters. Applications on real and simulated survey data showed its high efficiency. This …

EstimationStatistics::TheoryComputer scienceConsistency (statistics)B-splineLinear regressionStatisticsStatistics::MethodologySurvey data collectionEstimatorSurvey samplingNonparametric regression
researchProduct

Estimating with kernel smoothers the mean of functional data in a finite population setting. A note on variance estimation in presence of partially o…

2014

In the near future, millions of load curves measuring the electricity consumption of French households in small time grids (probably half hours) will be available. All these collected load curves represent a huge amount of information which could be exploited using survey sampling techniques. In particular, the total consumption of a specific cus- tomer group (for example all the customers of an electricity supplier) could be estimated using unequal probability random sampling methods. Unfortunately, data collection may undergo technical problems resulting in missing values. In this paper we study a new estimation method for the mean curve in the presence of missing values which consists in…

FOS: Computer and information sciencesStatistics and ProbabilityPopulationRatio estimatorLinearizationRatio estimator01 natural sciencesSurvey sampling.Horvitz–Thompson estimatorMethodology (stat.ME)010104 statistics & probabilityH\'ajek estimator0502 economics and businessApplied mathematicsMissing valuesHorvitz-Thompson estimator0101 mathematicseducationStatistics - Methodology050205 econometrics MathematicsPointwiseeducation.field_of_study[STAT.ME] Statistics [stat]/Methodology [stat.ME]05 social sciencesNonparametric statisticsEstimator16. Peace & justiceMissing dataFunctional data[ STAT.ME ] Statistics [stat]/Methodology [stat.ME]Kernel (statistics)Statistics Probability and UncertaintyNonparametric estimation[STAT.ME]Statistics [stat]/Methodology [stat.ME]
researchProduct

Conditional Bias Robust Estimation of the Total of Curve Data by Sampling in a Finite Population: An Illustration on Electricity Load Curves

2020

Abstract For marketing or power grid management purposes, many studies based on the analysis of total electricity consumption curves of groups of customers are now carried out by electricity companies. Aggregated totals or mean load curves are estimated using individual curves measured at fine time grid and collected according to some sampling design. Due to the skewness of the distribution of electricity consumptions, these samples often contain outlying curves which may have an important impact on the usual estimation procedures. We introduce several robust estimators of the total consumption curve which are not sensitive to such outlying curves. These estimators are based on the conditio…

FOS: Computer and information sciencesStatistics and ProbabilityPopulationWaveletsStatistics - Applications01 natural sciencesSurvey samplingMethodology (stat.ME)010104 statistics & probabilityKokic and bell methodConditional bias0502 economics and businessStatisticsApplications (stat.AP)Conditional bias0101 mathematics[MATH]Mathematics [math]educationStatistics - Methodology050205 econometrics MathematicsEstimationeducation.field_of_studyModified band depthbusiness.industryApplied Mathematics05 social sciencesSampling (statistics)Functional dataBootstrapElectricityStatistics Probability and Uncertaintybusinessasymptotic confidence bandsSocial Sciences (miscellaneous)Spherical principal component analysis
researchProduct

Robust estimation of mean electricity consumption curves by sampling for small areas in presence of missing values

2017

In this thesis, we address the problem of robust estimation of mean or total electricity consumption curves by sampling in a finite population for the entire population and for small areas. We are also interested in estimating mean curves by sampling in presence of partially missing trajectories.Indeed, many studies carried out in the French electricity company EDF, for marketing or power grid management purposes, are based on the analysis of mean or total electricity consumption curves at a fine time scale, for different groups of clients sharing some common characteristics.Because of privacy issues and financial costs, it is not possible to measure the electricity consumption curve of eac…

Linear mixed modelsSmall area estimationMissing dataRegression treesEstimation sur petits domaines[MATH.MATH-GM] Mathematics [math]/General Mathematics [math.GM]Estimateurs à noyauModèles linéaires mixtesRandom forestsBiais conditionnelsFunctional dataSurvey sampling[MATH.MATH-GM]Mathematics [math]/General Mathematics [math.GM]RobustesseDonnées fonctionnellesPlus proches voisinsForêts aléatoiresConditional biasKernel estimatorsNearest neighboursSondageDonnées manquantesRobustnessArbres de régression
researchProduct

Effects of diabetes definition on global surveillance of diabetes prevalence and diagnosis: a pooled analysis of 96 population-based studies with 331…

2015

Diabetes has been defined on the basis of different biomarkers, including fasting plasma glucose (FPG), 2-h plasma glucose in an oral glucose tolerance test (2hOGTT), and HbA1c. We assessed the effect of different diagnostic definitions on both the population prevalence of diabetes and the classification of previously undiagnosed individuals as having diabetes versus not having diabetes in a pooled analysis of data from population-based health examination surveys in different regions.

Maleendocrine system diseasesEndocrinology Diabetes and Metabolismmedicine.medical_treatmentGlobal Health0302 clinical medicineEndocrinologyeducation.field_of_studyDiabetis//purl.org/pe-repo/ocde/ford#3.02.18 [https]Diabetes Mellitus/blood/diagnosis/epidemiologySciences bio-médicales et agricolesadultosensibilidad y especificidadhealth survey3. Good healthpriority journalCARDIOVASCULAR-DISEASEdiabetes mellitusmedicine.medical_specialtyglucosa sanguíneaSurvey samplingoral glucose tolerance test.Medical sciencesSensitivity and SpecificityArticleEndocrinology Diabetes and Metabolism; Internal Medicine; EndocrinologyEffects of diabetesHemoglobin A Glycosylated/metabolism03 medical and health sciencesfalse positive resultSDG 3 - Good Health and Well-beingDiabetes prevalenceDiabetes MellitusSYSTEMATIC ANALYSISHumanshumandiagnostic test accuracy studygross national productOLDER-ADULTSeducationprueba de tolerancia a la glucosaglycosylated hemoglobinHEMOGLOBIN A(1C) MEASUREMENTVLAGGlycated HemoglobinHemoglobin A GlycosylatedScience & TechnologyBlood Glucose/metabolismnutritional and metabolic diseasesGlucose Tolerance Testeconomic aspectmedicine.diseaseglucose blood levelGlucoseEndocrinologyagechemistryFaculdade de Ciências SociaisGlucosaGlobal surveillance of diabetesTOLERANCE TESTWORLDWIDE STANDARDIZATIONBiomarkersBiomedical sciencesBlood GlucoseSettore MED/09 - Medicina InternaNutrition and DiseasehumanosBiomarkers/metabolismInternal Medicine; Endocrinology Diabetes and Metabolism; Endocrinologygeographychemistry.chemical_compoundVoeding en ZiekteDiagnosisPrevalenceMedicine and Health Sciencesvigilancia centinela030212 general & internal medicinehemoglobin A1cUS POPULATIONDiabetes diagnosisGlucose tolerance testINSULIN-RESISTANCEmedicine.diagnostic_testResearch Support Non-U.S. Gov'tQDiabetesprevalenciaSCREENING-TESThealthArticlesGlucose bloodDiabetes and MetabolismincomePopulation-based health examination surveysFemaleLife Sciences & BiomedicineAdultPopulationpopulation groupCONSENSUS STATEMENT030209 endocrinology & metabolismGLYCATED HEMOGLOBINhigh income regionEndocrinology & MetabolismInsulin resistanceResearch Support N.I.H. ExtramuralbloodInternal medicineDiabetes mellitusparasitic diseasesJournal ArticlemedicineInternal MedicineLife Scienceddc:613business.industryInsulinbody massBiological markerFASTING PLASMA-GLUCOSECiencias socio biomédicasGlycated hemoglobinbusinessmetabolismSentinel Surveillance
researchProduct

Estimate the mean electricity consumption curve by survey and take auxiliary information into account

2012

In this thesis, we are interested in estimating the mean electricity consumption curve. Since the study variable is functional and storage capacities are limited or transmission cost are high survey sampling techniques are interesting alternatives to signal compression techniques. We extend, in this functional framework, estimation methods that take into account available auxiliary information and that can improve the accuracy of the Horvitz-Thompson estimator of the mean trajectory. The first approach uses the auxiliary information at the estimation stage, the mean curve is estimated using model-assisted estimators with functional linear regression models. The second method involves the au…

Model-assisted estimator[ MATH.MATH-GM ] Mathematics [math]/General Mathematics [math.GM]Unequal probability sampling without replacement[MATH.MATH-GM] Mathematics [math]/General Mathematics [math.GM]Functional linear modelCovariance functionFunctional central limit theoremConfidence bandFunctional dataBootstrapSurvey sampling[MATH.MATH-GM]Mathematics [math]/General Mathematics [math.GM]Théorème central limite fonctionnelDonnées fonctionnellesHajek variance approximationFonction de covariancePlan à probabilités inégales sans remiseEstimateur de Horvitz-ThompsonModèle linéaire fonctionnelApproximation de HájekHorvitz-Thompson estimatorSondageBande de confianceEstimateur model-assisted
researchProduct

A Proposal to estimate the roaming–dog Total in an urban area through a PPSWOR spatial sampling with sample size greater than two

2018

Settore SECS-S/05 - Statistica SocialeDogs roaming in urban areas constitute an issue for public order hygiene and health. Proper planning of actions for health and security control and allocation of financial funds require the knowledge of the roaming–dog–population size in a given urban area. Unfortunately a reliable statistical procedure aimed to measure such population is not available yet in literature. This paper presents a simple reproducible survey sampling procedure to estimate the number of roaming dogs in an urban area through the description of a real study carried out on a restricted area of the city of Palermo in southern Italy. A sample of areas is drawn by means of a drawn–by–drawn spatial sampling with probabilities proportional to size and without replacement (PPSWOR). As inclusion probabilities are not available in closed form they are estimated by Monte Carlo approach which is of simple implementation and permits design–based variance estimation even when first–order inclusion probabilities are unknown.
researchProduct

A Three-Dimensional Object Point Process for Detection of Cosmic Filaments

2007

Summary We propose to apply an object point process to delineate filaments of the large scale structure in red shift catalogues automatically. We illustrate the feasibility of the idea on an example of the recent 2dF Galaxy Redshift Survey, describe the procedure and characterize the results.

Statistics and Probability2dF Galaxy Redshift SurveyCOSMIC cancer databaseComputer scienceProcess (computing)Survey samplingAstrophysics::Cosmology and Extragalactic AstrophysicsAstrophysicsCosmologyPoint processObject pointRed shiftCalculusStatistics Probability and UncertaintyAstrophysics::Galaxy AstrophysicsJournal of the Royal Statistical Society Series C: Applied Statistics
researchProduct

Correcting for non-ignorable missingness in smoking trends

2015

Data missing not at random (MNAR) is a major challenge in survey sampling. We propose an approach based on registry data to deal with non-ignorable missingness in health examination surveys. The approach relies on follow-up data available from administrative registers several years after the survey. For illustration we use data on smoking prevalence in Finnish National FINRISK study conducted in 1972-1997. The data consist of measured survey information including missingness indicators, register-based background information and register-based time-to-disease survival data. The parameters of missingness mechanism are estimable with these data although the original survey data are MNAR. The u…

Statistics and ProbabilityBackground informationFOS: Computer and information sciencesta112Test data generationComputer scienceSurvey samplingnon-participationta3142Smoking prevalenceBayesian inferenceMissing dataStatistics - Applicationsregistry dataMethodology (stat.ME)missing dataStatisticsSurvey data collectionRegistry dataApplications (stat.AP)Statistics Probability and Uncertaintysurvey samplingStatistics - Methodologysmoking prevalencehealth examination survey
researchProduct